Matrix-vector form of the Bellman equation

vπ(s)=E[Rt+1|St=s]+γE[Gt+1|St=s],=aAπ(a|s)rRp(r|s,a)rmean of immediate rewards+γaAπ(a|s)sSp(s|s,a)vπ(s)mean of future rewards=aAπ(a|s)[rRp(r|s,a)r+γsSp(s|s,a)vπ(s)],for all sS.(2.7)

The Bellman Equation in (2.7) is in an elementwise form. Since it is valid for every state, we can combine all these equations and write them concisely in a matrix- vector form, which will be frequently used to analyze the Bellman equation.

To derive the matrix- vector form, we first rewrite the Bellman equation in (2.7) as

vπ(s)=rπ(s)+γsSpπ(s|s)vπ(s)(2.8) rπ(s)aAπ(a|s)rRp(r|s,a)r,pπ(s|s)aAπ(a|s)p(s|s,a).

Suppose that the states are indexed as si with i=1,,n , where n=|S| . For state si , (2.8) can be written as

vπ(si)=rπ(si)+γsjSpπ(sj|si)vπ(sj).(2.9)

Let vπ=[vπ(s1),,vπ(sn)]TRn , rπ=[rπ(s1),,rπ(sn)]TRn , and PπRn×n with [Pπ]ij=pπ(sj|si) . Then, (2.9) can be written in the following matrix- vector form:

vπ=rπ+γPπvπ,(2.10)

where vπ is the unknown to be solved, and rπ,Pπ are known.

The matrix Pπ has some interesting properties. First, it is a nonnegative matrix, meaning that all its elements are equal to or greater than zero. This property is denoted as Pπ0 , where 0 denotes a zero matrix with appropriate dimensions. In this book, or represents an elementwise comparison operation. Second, Pπ is a stochastic matrix, meaning that the sum of the values in every row is equal to one. This property is denoted as Pπ1=1 , where 1=[1,,1]T has appropriate dimensions.

Consider the example shown in Figure 2.6. The matrix- vector form of the Bellman equation is

[vπ(s1)vπ(s2)vπ(s3)vπ(s4)]=[rπ(s1)rπ(s2)rπ(s3)rπ(s4)]+γ[pπ(s1|s1)pπ(s2|s1)pπ(s3|s1)pπ(s4|s1)pπ(s1|s2)pπ(s2|s2)pπ(s3|s2)pπ(s4|s2)pπ(s1|s3)pπ(s2|s3)pπ(s3|s3)pπ(s4|s3)pπ(s1|s4)pπ(s2|s4)pπ(s3|s4)pπ(s4|s4)][vπ(s1)vπ(s2)vπ(s3)vπ(s4)].

Substituting the specific values into the above equation gives

[vπ(s1)vπ(s2)vπ(s3)vπ(s4)]=[0.5(0)+0.5(1)111]+γ[00.50.50000100010001][vπ(s1)vπ(s2)vπ(s3)vπ(s4)].

It can be seen that Pπ satisfies Pπ1=1 .

Figure 2.6: An example for demonstrating the matrix-vector form of the Bellman equation.